Entity Detection

Entity Detection identifies and extracts key entities from content in submitted audio.

detect_entities boolean. Default: false

Pre-recorded Streaming English (all available regions)

Entity Detection

When Entity Detection is enabled, the Punctuation feature will be enabled by default.

Enable Feature

To enable Entity Detection, when you call Deepgram’s API, add a detect_entities parameter set to true in the query string:

detect_entities=true&punctuate=true

When Entity Detection is enabled, Punctuation will also be enabled by default.

To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.

cURL
$curl \
> --request POST \
> --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
> --header 'Content-Type: audio/wav' \
> --data-binary @youraudio.wav \
> --url 'https://api.deepgram.com/v1/listen?detect_entities=true&punctuate=true'

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

JSON
1{
2 "metadata": {
3 "transaction_key": "string",
4 "request_id": "string",
5 "sha256": "string",
6 "created": "string",
7 "duration": 0,
8 "channels": 0
9 },
10 "results": {
11 "channels": [
12 {
13 "alternatives":[],
14 }
15 ]
16 }

Let’s look more closely at the alternatives object:

JSON
1"alternatives":[
2 {
3 "transcript":"Welcome to the Ai show. I'm Scott Stephenson, cofounder of Deepgram...",
4 "confidence":0.9816771,
5 "words": [...],
6 "entities":[
7 {
8 "label":"NAME",
9 "value":" Scott Stephenson",
10 "confidence":0.9999924,
11 "start_word":6,
12 "end_word":8
13 },
14 {
15 "label":"ORG",
16 "value":" Deepgram",
17 "confidence":0.9999757,
18 "start_word":10,
19 "end_word":11
20 },
21 {
22 "label": "CARDINAL_NUM",
23 "value": "one",
24 "confidence": 1,
25 "start_word": 186,
26 "end_word": 187
27 },
28 ...
29 ]
30 }
31]

In this response, we see that each alternative contains:

  • transcript: Transcript for the audio being processed.
  • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
  • entities: Object containing the information about entities for the audio being processed.

And we see that each entities object contains:

  • label: Type of entity identified.
  • value: Text of entity identified.
  • confidence: Floating point value between 0 and 1 that indicates entity reliability. Larger values indicate higher confidence.
  • start_word: Location of the first character of the first word in the section of audio being inspected for entities.
  • end_word: Location of the first character of the last word in the section of audio being inspected for entities.

All entities are available in English.

Identifiable Entities

View all options here: Supported Entity Types

Use Cases

Some examples of uses for Entity Detection include:

  • Customers who want to improve Conversational AI and Voice Assistant by triggering particular workflows and responses based on identified name, address, location, and other key entities.
  • Customers who want to enhance customer service and user experience by extracting meaningful and relevant information about key entities such as a person, organization, email, and phone number.
  • Customers who want to derive meaningful and actionable insights from the audio data based on identified entities in conversations.